DESCRIPTION OF DATA SOURCES

  1. Copy of Cage_5_2017_2019:

The file contains sigle sheet(Scallop_Cage_Tagged_2017-2018) of data with columns “Date Time” from 08/23/2017 to 06/22/2019, “Temp (°F) #20178444” with temperature data, empty columns for “Coupler Detached”, “Coupler Attached”, “Host Connected”, “Stopped”, “End Of File”.

  1. Copy of Cage_5_GSI_2019:

The file contains three sheets of data namely “DATA”, “Event Data”, “Details”. “Data” sheet contains temperqature data and light intensity data from 06/30/2019 to 07/25/2019. “Event Data” contains logs of when data is collected with colmns Date, Button Down and Button Up representing when the data was collected. “Details” sheet contains specefications of devices, series of temperatures and series of intensity as a summary of data from the sheet “DATA”.

  1. Copy of Cage_6_2018_2019:

The file contains sigle sheet(scallopcage6) of data with columns “Date Time” from 11/15/2018 to 06/22/2019 every 30 minutes, “Temp (°F) #20178444” with temperature data, empty columns for “Coupler Detached”, “Coupler Attached”, “Host Connected”, “Stopped”, “End Of File”. This file is the data collected from the cage number 6.

  1. Copy of GSI Data Sheet:

The file contains two sheets of data, “Lantern Net GSI’s” and “Pivot Table 3”. “Lantern Net GSI’s” is data about various dimensions of weight relating to the scallops as well as the vessels they were contained in, position in the water, and their sex. Collected from 07/10/2019 to 08/29/2019. “Pivot Table 3” is a dynamic summary table of first sheet.

  1. Copy of Net_2_2018_2019:

Temperature data from 11/15/2018 to 06/22/2019 every 30 minutes. It has 10520 rows including the header. The header includes the following columns: “Date Time”, “Temp (°F) #10558683”,“Coupler Detached”, “Coupler Attached”, “Host Connected”, “Stopped” , “End Of File”

  1. Copy of Net_5_Bottom_GSI_2019:

It includes three sheets i.e. ‘DATA’, ‘Event data’, ‘Details’. The ‘DATA’ sheet includes temperature and luminance intensity data from 06/30/2019 to 8/15/19 collected every 15 minutes, from the bottom of net 5. It has 4454 rows including the header and title. The header includes the following columns: " Date Time, GMT -0400" , “Temp, °F”, “Intensity”,“lum/ft²”

The ‘Event Data’ sheet includes the temperature logged for 2 mins. It has 20 rows including the header and title. The header includes the following columns: “Date Time, GMT -0400”,“Button Down”, “Button Up”,“Host Connect”,“EOF”. The ‘Details’ sheet includes the details for the devices, intensity statistics, and event type.

  1. Copy of Net_5_Top_GSI_Top:

It also includes three sheets i.e. ‘DATA’, ‘Event data’, ‘Details’. The ‘DATA’ sheet includes temperature and luminance intensity data from 06/30/2019 to 08/15/2019 collected every 15 minutes, from the top of net 5. It has 4454 rows including the header and title. The header includes the following columns: " Date Time, GMT -0400" , “Temp, °F”, “Intensity”,“lum/ft²”

The ‘Event Data’ sheet includes the temperature logged for 15 secs. It has 5 rows including the header and title. The header includes the following columns: “Date Time GMT -0400”,“Button Down”,“Button Up”,“Host Connect”,“EOF”

The ‘Details’ sheet includes the details for the devices, temperature and intensity statistics, and event type.

  1. GSI Data Sheet:

Includes same data as copy of GSI sheet, but expanded to also include average GSI of the various net and cage data collected that day. It also splits the averages for the nets and cage GSI as well. The date range includes everything from Copy of GSI Data Sheet starting at 07/10/2019, but runs all the way through 10/11/2019

INTELLECTUAL POLICY CONSTRAINTS

Phoebe and the other stakeholders behind Hurricane Island’s Scallop data collecting have not, at this point, specified any particular restraints or concerns about the sharing or publishing of data. However, until further clarification is given, we are laboring under the assumption that data will remain confidential to the organization unless specified to us otherwise.

META DATA

All files pertaining to the Cage #/Net # contain the same columns, as follows * Temperature - Recorded in Fahrenheit with the devices temperature probes * Intensity, lum ft2 - A measure of the amount of light sensed at the location of the temperature probe, this can vary based on the time of day as well as how clear the water is in that particular area.

GSI Data Sheet Info -
  • Surface H20 Temperature - Average water surface temperature measured in Celsius.
  • Gear Type - The designation of what type of Gear the scallops were contained in, either a “Cage” or a “Lantern Net”.
  • Level/Position # - The location that the scallops were kept in, relative to their gear types.
    • Cage - Labeled by 2 character code. First character is T, M, or B meaning Top, Middle, and Bottom, respectively. The Second character is an L or R, meaning Left or Right, respectively. For instance: TL means “Top Left”
    • Lantern Net - Labeled at numerical levels 1-10 each representing the depth of that level of the net, approximately spaced 1 feet apart. 1 = 1 foot deep, 10 = 10 feet deep.
  • Sex - The biological sex of the specimen, designated as M for “Male” and F for “Female
  • Shell Height - Height of the specimens shell measured in mm.
  • Total Viscera Weight (VW) - The total weight of the scallop’s guts after being shucked out of the shell, measured in grams.
  • Meat Weight (MW) - Weight of the main, edible muscle of the scallop, measured in grams.
  • Gonad Weight (GW) - Weight of the reproductive organs of the scallop, measured in grams.
  • Shell Weight (SW) - Weight of scallop shell once the viscera was removed, measured in grams.
  • GSI - Gonadosomatic Index, a measure of is the calculation of the gonad mass as a proportion of the total body mass. It is represented by the formula: GSI = [gonad weight / total tissue weight] × 100.
  • Average GSI - The Average GSI of the specimens collected that day.

ISSUES WITH DATA

RATIONALE FOR REMEDIATING DATA

The steps taken for remediating data are:

SCRIPT OR STEP-TO- STEP DESCRIPTION FOR DATA CLEANING

Here is the step by step procedure to setup a project with clean datasets.

STEPS:
  1. Open RStudio.
  2. Create a new project by choosing the directory.
  3. Set working directory to a desired directory by selecting session -> Working Directory -> Desired directory.
Set Working Directory.

Set Working Directory.

  1. Install packages
  • library(tidyverse)
  1. In the Environment section, click on import dataset -> From Excel
Import Data Set.

Import Data Set.

  1. Click Browse and select Excel data file provided by the Client.
  2. Select the desired sheet from the Sheet dropdown. (Please refer to Issues With Data section for information of which sheet to select for respective data sheets.)
Select Sheet.

Select Sheet.

  1. Verify the data types for all the columns.
  2. Data type for Date columns has to be set to Date.
  3. Data type for unwanted or empty columns should be set to “Skip”.(Please refer to Issues With Data section for information of which columns to delete.)
Select Sheet.

Select Sheet.

  1. Click the import button.
  2. Run the following code to update column names to eliminate space, comma or parenthesis in the names. Replace the name of the sheet accordingly in the highlighted areas.
names(Copy_of_Cage_5_2017_2019)<-str_replace_all(names(Copy_of_Cage_5_2017_2019),c(" "=".",","="","\\(.*\\)"=""))
  1. Run the following command to write the edited view to a csv file. This file will be saved to the current working directory.
write.csv(Copy_of_Cage_5_2017_2019, 'Copy_of_Cage_5_2017_2019_cleaned.csv', row.names = FALSE)

CONTRIBUTION STATEMENT :

The work for this assignment was distributed as follows:

DATA SOURCE DESCRIPTION, INTELLECTUAL POLICY CONSTRAINTS, META DATA - Connor McCoy ISSUES WITH DATA, RATIONALE FOR REMEDIATING DATA - Julie Sunny Mathew STEP-TO- STEP DESCRIPTION FOR DATA CLEANING - Swetha Byluppala As the role of the Proof Reader is alternating for every assignment, Swetha sindhuja Byluppala fulfilled the role of Proof Reader for the Data Cleaning Documentation.

I, Swetha Sindhuja Byluppala, have reviewed this work and agree that it is ready for submission.